离散的不变学习旨在在无限维函数空间中学习,其能力将功能的异质离散表示作为学习模型的输入和/或输出。本文提出了一个基于整体自动编码器(IAE-NET)的新型深度学习框架,用于离散不变学习。 IAE-NET的基本构建块由编码器和解码器组成,作为与数据驱动的内核的积分转换,以及编码器和解码器之间的完全连接的神经网络。这个基本的构建块并行地在宽的多通道结构中应用,该结构反复组成,形成了一个具有跳过连接作为IAE-NET的深度连接的神经网络。 IAE-NET接受了随机数据扩展的培训,该数据具有随机数据,以生成具有异质结构的培训数据,以促进离散化不变性学习的性能。提出的IAE-NET在预测数据科学中进行了各种应用,解决了科学计算中的前进和反向问题,以及信号/图像处理。与文献中的替代方案相比,IAE-NET在现有应用中实现了最先进的性能,并创建了广泛的新应用程序。
translated by 谷歌翻译
Graph Neural Networks (GNNs), originally proposed for node classification, have also motivated many recent works on edge prediction (a.k.a., link prediction). However, existing methods lack elaborate design regarding the distinctions between two tasks that have been frequently overlooked: (i) edges only constitute the topology in the node classification task but can be used as both the topology and the supervisions (i.e., labels) in the edge prediction task; (ii) the node classification makes prediction over each individual node, while the edge prediction is determinated by each pair of nodes. To this end, we propose a novel edge prediction paradigm named Edge-aware Message PassIng neuRal nEtworks (EMPIRE). Concretely, we first introduce an edge splitting technique to specify use of each edge where each edge is solely used as either the topology or the supervision (named as topology edge or supervision edge). We then develop a new message passing mechanism that generates the messages to source nodes (through topology edges) being aware of target nodes (through supervision edges). In order to emphasize the differences between pairs connected by supervision edges and pairs unconnected, we further weight the messages to highlight the relative ones that can reflect the differences. In addition, we design a novel negative node-pair sampling trick that efficiently samples 'hard' negative instances in the supervision instances, and can significantly improve the performance. Experimental results verify that the proposed method can significantly outperform existing state-of-the-art models regarding the edge prediction task on multiple homogeneous and heterogeneous graph datasets.
translated by 谷歌翻译
Deep reinforcement learning has recently emerged as an appealing alternative for legged locomotion over multiple terrains by training a policy in physical simulation and then transferring it to the real world (i.e., sim-to-real transfer). Despite considerable progress, the capacity and scalability of traditional neural networks are still limited, which may hinder their applications in more complex environments. In contrast, the Transformer architecture has shown its superiority in a wide range of large-scale sequence modeling tasks, including natural language processing and decision-making problems. In this paper, we propose Terrain Transformer (TERT), a high-capacity Transformer model for quadrupedal locomotion control on various terrains. Furthermore, to better leverage Transformer in sim-to-real scenarios, we present a novel two-stage training framework consisting of an offline pretraining stage and an online correction stage, which can naturally integrate Transformer with privileged training. Extensive experiments in simulation demonstrate that TERT outperforms state-of-the-art baselines on different terrains in terms of return, energy consumption and control smoothness. In further real-world validation, TERT successfully traverses nine challenging terrains, including sand pit and stair down, which can not be accomplished by strong baselines.
translated by 谷歌翻译
This paper introduces the use of evolutionary algorithms for solving differential equations. The solution is obtained by optimizing a deep neural network whose loss function is defined by the residual terms from the differential equations. Recent studies have used stochastic gradient descent (SGD) variants to train these physics-informed neural networks (PINNs), but these methods can struggle to find accurate solutions due to optimization challenges. When solving differential equations, it is important to find the globally optimum parameters of the network, rather than just finding a solution that works well during training. SGD only searches along a single gradient direction, so it may not be the best approach for training PINNs with their accompanying complex optimization landscapes. In contrast, evolutionary algorithms perform a parallel exploration of different solutions in order to avoid getting stuck in local optima and can potentially find more accurate solutions. However, evolutionary algorithms can be slow, which can make them difficult to use in practice. To address this, we provide a set of five benchmark problems with associated performance metrics and baseline results to support the development of evolutionary algorithms for enhanced PINN training. As a baseline, we evaluate the performance and speed of using the widely adopted Covariance Matrix Adaptation Evolution Strategy (CMA-ES) for solving PINNs. We provide the loss and training time for CMA-ES run on TensorFlow, and CMA-ES and SGD run on JAX (with GPU acceleration) for the five benchmark problems. Our results show that JAX-accelerated evolutionary algorithms, particularly CMA-ES, can be a useful approach for solving differential equations. We hope that our work will support the exploration and development of alternative optimization algorithms for the complex task of optimizing PINNs.
translated by 谷歌翻译
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
The crystallization of modeling methods around the Transformer architecture has been a boon for practitioners. Simple, well-motivated architectural variations can transfer across tasks and scale, increasing the impact of modeling research. However, with the emergence of state-of-the-art 100B+ parameters models, large language models are increasingly expensive to accurately design and train. Notably, it can be difficult to evaluate how modeling decisions may impact emergent capabilities, given that these capabilities arise mainly from sheer scale alone. In the process of building BLOOM--the Big Science Large Open-science Open-access Multilingual language model--our goal is to identify an architecture and training setup that makes the best use of our 1,000,000 A100-GPU-hours budget. Specifically, we perform an ablation study at the billion-parameter scale comparing different modeling practices and their impact on zero-shot generalization. In addition, we study the impact of various popular pre-training corpora on zero-shot generalization. We also study the performance of a multilingual model and how it compares to the English-only one. Finally, we consider the scaling behaviour of Transformers to choose the target model size, shape, and training setup. All our models and code are open-sourced at https://huggingface.co/bigscience .
translated by 谷歌翻译
本文旨在探讨如何合成对其进行训练的现有视频脱毛模型的近距离模糊,可以很好地推广到现实世界中的模糊视频。近年来,基于深度学习的方法已在视频Deblurring任务上取得了希望的成功。但是,对现有合成数据集培训的模型仍然遭受了与现实世界中的模糊场景的概括问题。造成故障的因素仍然未知。因此,我们重新审视经典的模糊综合管道,并找出可能的原因,包括拍摄参数,模糊形成空间和图像信号处理器〜(ISP)。为了分析这些潜在因素的效果,我们首先收集一个超高帧速率(940 fps)原始视频数据集作为数据基础,以综合各种模糊。然后,我们提出了一种新颖的现实模糊合成管道,该管道通过利用模糊形成线索称为原始爆炸。通过大量实验,我们证明了在原始空间中的合成模糊并采用与现实世界测试数据相同的ISP可以有效消除合成数据的负面影响。此外,合成的模糊视频的拍摄参数,例如,曝光时间和框架速率在改善脱毛模型的性能中起着重要作用。令人印象深刻的是,与在现有合成模糊数据集中训练的训练的模型合成的模糊数据训练的模型可以获得超过5DB PSNR的增益。我们认为,新颖的现实合成管道和相应的原始视频数据集可以帮助社区轻松构建自定义的Blur数据集,以改善现实世界的视频DeBlurring性能,而不是费力地收集真实的数据对。
translated by 谷歌翻译
常规的多视图聚类试图基于所有观点的假设,以完全观察到所有观点的假设。但是,在诸如疾病诊断,多媒体分析和建议系统之类的实际应用中,常见的是,在许多情况下,并非所有样品的观点都可以使用,这导致常规多视图聚类方法的失败。在此不完整的多视图数据上的聚类称为不完整的多视图聚类。鉴于有前途的应用前景,近年来对不完整的多视图聚类的研究取得了明显的进步。但是,没有调查可以总结当前的进展并指出未来的研究方向。为此,我们回顾了最新的关于多视图聚类的研究。重要的是,我们提供一些框架来统一相应的不完整的多视图聚类方法,并从理论和实验角度对某些代表性方法进行深入的比较分析。最后,为研究人员提供了不完整的多视图聚类领域中的一些开放问题。
translated by 谷歌翻译
在本文中,我们介绍了DA $^2 $,这是第一个大型双臂灵敏性吸引数据集,用于生成最佳的双人握把对,用于任意大型对象。该数据集包含大约900万的平行jaw grasps,由6000多个对象生成,每个对象都有各种抓紧敏度度量。此外,我们提出了一个端到端的双臂掌握评估模型,该模型在该数据集的渲染场景上训练。我们利用评估模型作为基准,通过在线分析和真实的机器人实验来显示这一新颖和非平凡数据集的价值。所有数据和相关的代码将在https://sites.google.com/view/da2dataset上开源。
translated by 谷歌翻译
几乎没有零件分割的目的是仅给出几个带注释的样本,将对象的不同部分分开。由于数据有限的挑战,现有的作品主要集中在学习分类器上,而不是预先训练的功能,无法学习针对零件细分的任务特定功能。在本文中,我们建议在“预训练” - “微调”范式中学习特定于任务的功能。我们进行及时设计以减少预训练任务(即图像生成)与下游任务(即部分分段)之间的差距,以便可以利用生成的GAN先验进行分割。这是通过将零件分割图投影到RGB空间中并在RGB分割图和原始图像之间进行插值来实现的。具体而言,我们设计了一种微调策略,以逐步将图像发生器调整到分割生成器中,在该机构中,生成器的监督通过插值从图像到分割图各不等。此外,我们提出了一个两流体系结构,即一个分割流以生成特定于任务的特征,以及一个图像流以提供空间约束。图像流可以视为自我监管的自动编码器,这使我们的模型能够从大规模的支持图像中受益。总体而言,这项工作是试图通过及时设计来探索一代任务和感知任务之间的内部相关性。广泛的实验表明,我们的模型可以在几个部分分割数据集上实现最新性能。
translated by 谷歌翻译